Lecture 3(Partl ) 



Topics covered: 
CPU Architecture 



^ Fetch/execute cycle of an instruction 

□ step I: 

♦ Fetch the contents of the memory location pointed to by 
Program Counter (PC). 

♦ PC points to the memory location which has the instruction to 
be executed. 

♦ Load the contents of the memory location into Instruction 
Register (IR). 

□ Step II: 

♦ Increment the contents of the PC by 4 (assuming the memory is 
byte addressable and the word length is 32 bits). 

□ Step III: 

♦ Carry out the operation specified by the instructions in the IR. 

□ Steps I and II constitute the fetch phase, and are repeated 
as many Ymes as necessary to fetch the con:\plete 
instruction. 

□ Step III constitutes the execution phase. 
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Internal organization of a processor 



□ Recall that a processor has several registers/building 
blocks: 

♦ Memory address register (MAR) 

♦ Memory data register (MDR) 

♦ Program Counter (PC) 

♦ Instruction Register (IR) 

♦ General purpose registers RO - R(n-l) 

♦ Arithmetic and logic unit (ALU) 

♦ Control unit. 

□ How are these units organized and how do they con:\n:^unicate 
with each other? 
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Internal organization of a processor 
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^ Single bus organization 

□ Single bus organization: 

♦ ALU, control unit and all the registers are connected via a 
single common bus. 

♦ Bus is internal to the processor and should not be confused 
with the external bus that connects the processor to the 
memory and I/O devices. 

□ Data lines of the external n:^en:^ory bus are connected to the 
internal processor bus via MDR. 

♦ Register MDR has two inputs and two outputs. 

♦ Data may be loaded to (from) MDR from (to) internal processor 
bus or external memory bus. 

□ Address lines of the external memory bus are connected to 
the internal processor bus via MAR. 

♦ MAR receives input from the internal processor bus. 

♦ MAR provides output to external memory bus. 
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Single bus organization (contd..) 



□ Instruction decoder and control logic block, or control unit 
issues signals to control the operation of all units inside the 
processor and for interacting with the memory bus. 

♦ Control signals depend on the instruction loaded in the 
Instruction Register (IR) 

□ Outputs from the control logic block are connected to: 

♦ Control lines of the memory bus. 

♦ ALU, to determine which operation Is to be performed. 

♦ Select input of the multiplexer MUX to select between 
Register Y and constant 4. 

♦ Control lines of the registers, to select the registers. 
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^ Single bus organization (contd..) 



□ Registers T, and TEMP: 

♦ Used by the processor for temporary storage during execution 
of some instructions. 

♦ Note that Registers RO to R(n-l) are used to store data 
generated by one instruction for later use by another 
instruction. 

♦ Data is stored in RO through R(n-l) after the execution of an 
instruction. 

□ Multiplexer MUX selects either the output of register Y or 
0 constant 4, depending upon the control input Select. 

♦ Constant 4 is used to increment the value of the PC. 
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Registers and the bus 
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Registers and the bus (contd..) 



□ A bus may be viewed as a collection of parallel wires. 

□ Buses have no memory: 

♦ They are just a collection of wires. 

□ When data is on the bus, all registers can "see" that data at 
their inputs. 

□ A register may place its contents onto the bus. 
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Registers and the bus (contd..) 



□ At any one time, only one register may output its contents to 
the bus: 

♦ Which register outputs its content to the bus is determined by 
the control signal issued by the control logic. 

♦ Control signal depends on the instruction loaded in the 
instruction register IR. 

□ Registers can load data from the bus: 

♦ Which registers load data from the bus is determined by the 
control signal issued by the control logic. 

□ Registers are clocked (sequential) entities (unlike ALU which 
is purely combinatorial). 
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Registers are connected to the bus via 
switches controlled by the signals 
Rin <& Rout. 

Each register Ri'has two control signals, 

and Ri^^, 

If Rii„=l. the data from the bus is loaded 
into the register. 

If Riout=l, '^^^ data from the register is 
loaded onto the bus. 



The same holds for registers Fand Zas 
well. 



Registers and the bus (contd..) 
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•Each bit in a register may be implemented by an edge-triggered D flip flop. 

• Two input multiplexer is used to select the data applied to the input of an 
edge triggered flip-flop. 

• Q output of the flip-flop is connected to the bus via a tri- state gate. 
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Registers and the bus (contd..) 
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Riin = 1: 

Multiplexer selects the data on the bus. 

Data is loaded into the flip-flop at the rising edge of the clock. 

Riin = 0: 

Multiplexer feeds back the value currently stored in the flip-flop. 
Q output represents the value currently stored in the flip-flop. 
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Registers and the bus (contd..) 



Bus 



0 



R/ 



in 



Clock 



D 


Q 


> 


Q 



R/ 



out 



Riout - 1- 



Tri-state gate loads the value of the flip-flop onto the bus. 
Data is loaded onto the bus at the rising edge of the clock. 



Riout - 0: 



Gate 's output is in high-impedance (electrically disconnected) state. 
Corresponds to open-circuit state. 
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Registers and the bus (contd..) 



Operation of a tri-state gate 



•A tri-state gate can enter one of three output states. 

- its output can be in a logic low state (L). 

- its output can be in a logic high state (H). 

- its output can be effectively an open-circuit (high impedance) 

• When a tri-state gate is connected to a bus in high-impedance state, its outputs 
are effectively disconnected from the bus. 

Riout - 1' output is: Ri^m = 0: 

Logic low, if Q = 0 High impedance 

Logic high, if Q = 1 Open circuit condition 



Bus 



Bus 
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^ Registers and the bus (contd..) 

Operation of an edge-triggered flip-flop 




•Data is loaded from the register to tlie bus (or to tlie register from tlie bus) 

at tlie rising edge of tlie clock. 

•Data is loaded at the L-H transition of the clock. 
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Registers and the bus (contd..) 



□ Data transfers and operations take place within time periods 
defined by the processor clock. 

♦ Time period is known as the clock cycle. 

□ At the beginning of the clock cycle, the control signals that 
govern a particular transfer are asserted. 

♦ For e.g., if the data are to be transferred from register RO to 
the bus, then RO^^^ is set to 1. 

□ Edge-triggered flip-flop operation explained earlier used 
only the rising edge of the clock for data transfer. 

♦ Other schemes are possible, for example, data transfers may 
use rising and falling edges of the clock. 

□ When edge-triggered flip-flops are not used, two or more 
clock signals may be needed to guarantee proper transfer of 
data. This is known as multiphase clocking . 
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Simple register transfer example 



Transfer the contents of register R3 to register R4 




1. Control signals R3^^^ and R4^j^ become 1. They stay valid until the end of 
the clock cycle. 

2. After a small delay, the contents ofR3 are placed onto the bus. The contents 
ofR3 stay onto the bus until the end of the clock cycle. 

3. At the end of the clock cycle, the data onto the bus is loaded into R4. R3^^^ 
and R4:„ become 0. 
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Loading multiple registers from the bus 



Transfer the contents of register R3 to register R4, R5 




1. Control signals R3^^p R4jjj and RSj^ become 1. They stay valid until the end of 
the clock cycle. 

2. After a small delay, the contents ofRS are placed onto the bus. The contents 
ofR3 stay onto the bus until the end of the clock cycle. 

3. At the end of the clock cycle, the data onto the bus is loaded into R4 and R5. 
R3^„^ R4:„ and RS,„ become 0. 
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Loading multiple registers from the bus (contd..) 



□ It is possible to load multiple registers simultaneously from 
the bus. 

♦ For e.g., transfer the contents of register R3 to registers R4 
and R7 simultaneously. 

□ The number of registers that can be simultaneously loaded 
depends on: 

♦ Drive capability (fan-out) 

♦ Noise. 

♦ Note that this is an electrical issue, not a logical issue. 

□ Distinguish this from multiple registers loading the bus: 

♦ For e.g. load the contents of registers R3 and R4 onto the bus 
simultaneously. 

♦ Logically inconsistent event. 

♦ Physically dangerous event. 
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Arithmetic Logic Unit (ALU) 



□ ALU is Q purely combinQtoriQl device: 

♦ It has no memory or internal storage. 

□ It has 2 input vectors: 

♦ These may be called the A- and B-vector or the R- and S-vector 

♦ The inputs are as wide as the registers/system bus (e.g.. 16. 32 
bits) 

□ It has 1 output vector 

♦ Usually denoted F 
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Arithmetic Logic Unit (ALU) (contd..) 



Sample functions performed by the ALU 

• F = A+B F - A+B+l 

• F = A-B F = A-B-1 

• F = A and B F = A or B 

• F = not A F = not B 

• F = not A + 1 F = not B + 1 

• F = (not A) and BF = A and (not B) 

• F = A xor B F = not (A xor B) 

• F = A F= B 
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Arithmetic Logic Unit (ALU) (contd..) 
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Arithmetic and Logic Unit (ALU) (contd..) 



ALU connections to the bus 




• ALU must have only one input connection 
from tlie bus. 

* Tlie other input must be stored in a holdins 
register called Y register. 

• A multiplexer selects amons register Y and 4 
depending upon select line. 

* One operand of a two-operand instruction must 
be placed into the Y register before the other 
operand must be placed onto the bus. 



Processor bus 
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Arithmetic and Logic Unit (ALU) (contd..) 



ALU connections to the bus 




• Identical reasoning tells us that there must 
be an output register Z which collects the 
output of the ALU at the end of each cycle. 
• This way, there can be 
--one operand in the Y register 
-one operand ON THE BUS 
--the result stored in the Z register 



Processor bus 



Performing an arithmetic operation 



Add the contents of registers Rl and R2 and place the result In R3. 

That Is: R3 = R1+R2 

1. Place the contents of register Rl into the Y resistor in the first 
clock cycle. 

2. Place the contents of register R2 onto the bus in the second 
clock cycle. 

Both inputs to the ALU are now valid. Select register Y, and 
assert the AL U command F=A +B. 

3. In the third clock cycle, Z register has latched the output of the 
ALU. Thus the contents of the Z register can be copied into 
register R3. 
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Performing an arithmetic operation (contd..) 
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Performing an arithmetic operation (contd..) 
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Performing an arithmetic operation (contd..) 



PC 



Address 
lines 



Memory 
bus 



MAR 



Data 
lines 



MDR 



Y 



Constant 4 



SelectY. 



Add 



^ ^ MUX / 



ALU 
control 
lines 



Sub 



XOR 



ALU 




Instruction 
decoder and 
control logic 



IR 



Control 
Signals 



Clock cycle 3: 



Rl 



R2 



R3 



Clock cycle 4: 

R3 has the sum. 
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Performing an arithmetic operation (contd..) 



Clock Cycle 1: 

Moulin (Y=R1) 

Clock Cycle 2: 

Rl ^SdectX AddZ^ (Z=R1+R2) 

Clock Cycle 3: 
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Performing an arithmetic operation (contd..) 



□ Inputs of the ALU: 

♦ Input B is tied to the bus. 

♦ Input A is tied to the output of the multiplexer. 

□ Output of the ALU: 

♦ Tied to the input of the Z register. 

□ Z register: 

♦ Input tied to the output of the ALU. 

♦ Output tied to the bus. 

♦ Unlike Ri ^ ^ . loads data from the output of the ALU and not 
the bus. 



^ Performing an arithmetic operation (contd..) 
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ALU operations 



□ RC = RAovRB 

□ Clock cycle 1: 

♦ Move RA to Fregister. 

□ Clock cycle 2: 

♦ Put RBor\ the bus, perform F=RA opRB, and transfer the 
result to Z. 

♦ RB ^, (RA ovRB)=L Selects 

□ Clock cycle 3: 

♦ Put Zor\ the bus, and load Zinto RC. 



^ Fetching q word from memory 

□ Processor has to specify the address of the memory location 
where this information is stored and request a Read 
operation. 

□ Processor transfers the required address to MAR. 

♦ Output of MAR is connected to the address lines of the 
memory bus. 

□ Processor uses the control lines of the memory bus to 
indicate that a 7?^^ J operation is needed. 

□ Requested information are received from the memory and 
are stored in MDR. 

♦ Transferred from MDR to other registers. 
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Fetching q word from memory (contd..) 



Connections for register MDR 
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MDR^^^^ andMDR - ^ control 
connection to external bus. 



MDR^ andMDR^ control 
connection to internal bus. 
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Fetching q word from memory (contd..) 



□ Timing of the internal processor operations must be 
coordinated with the response time of memory Read 
operations. 

□ Processor completes one internal data transfer in one clock 
cycle. 

□ Memory response time for a Read operation is variable and 
usually longer than one clock cycle. 

♦ Processor waits until it receives an indication that the 
requested Read has been completed. 

♦ Control signal Memory Function Completed (MFC) is used for 
this purpose. 

♦ MFC is set to 1 by the memory to indicate that the contents of 
the specified location have been read and are available on the 
data lines of the memory bus. 
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^ Fetching q word from memory (contd..) 



MOVE(RllR2 

1. Load the contents of Register Rl into MAR. 

2. Start Q Read operation on the memory bus. 

3. Wait for MFC response from the memory. 

4. Load MDR from the memory bus. 

5. Load the contents of MDR into Register R2. 



steps can be performed 
separately, some may be 
combined. 



1. Steps 1 and 2 can be combined. 

- Load RIto MAR and activate ^g^c^ control signal simultaneously. 

2. Steps 3 and 4 can be combined. 

- Activate control signal MP^^ while waiting for response from 
the memory MFC. 

3. Last step loads the contents of MP^into Register R2. 

\^er\ce. Memory Read operation takes 3 steps. 
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Fetching q word from memory (contd..) 



MOVE (Rl) , R2U<zn\ory operation takes 3 steps. 

Step 1: 

- Place Rl or\\o the internal processor bus. 

- Load the contents of the bus into MAR. 

- Activate the Readcor\\ro\ signal. 
- Rl .., ^ MAR : „. Read. 



Step 2: 

- Wait for MFC from the memory. 

- Activate the control signal to load data from external bus to MDR. 
- MDR , ^. WMFC 

Step 3: 

- Place the contents of MPT? onto the internal processor bus. 

- Load the contents of the bus into Register R2. 

-^MBRoutllMlin 
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storing q word into memory 



MOVER2, Memory operation takes 3 steps. 



Step 1: 

- Place Rl onto the internal processor bus. 

- Load the contents of the internal processor bus into MAR. 



Step 2: 

- Place R2oY\\o the internal processor bus. 

- Load the contents of the internal processor bus into MDR. 

- Activate Write operation. 



- MDR : „. Write 



Step 3: 

- Place the contents of MDR into the external memory bus. 

- Wait for the memory write operation to be completed MFC. 



^MDRoutExmiEC 
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^ Execution of q complete instruction 

Add the contents of a memory location pointed to by Register R3 

to register Rl. 
ADD (R31 Rl 



To execute the instruction we must execute the following tasks: 

1. Fetch the instruction. 

2. Fetch the operand (contents of the n:\en:\ory location pointed to by R3.) 

3. Perf orn:\ the addition. 

4. Load the result into RJ. 
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^ Execution of q complete instruction 

Task 1: Fetch the instruction 

Recall that: 

- PC holds the address of the memory location which has the next 
instruction to be executed. 

- IR holds the instruction currently being executed. 
Step 1 

- Load the contents of PCto MAR. 

- Activate the Read cor\\ro\ signal. 

- Increment the contents of the PC by 4. 
- PC^MAR , ^ Read, SeleM Add, Z ^. 

Step 2 

- Update the contents of the PC. 

- Copy the updated PC to Register F (useful for Branch instructions). 

- Activate the control signal to load data from external bus to MDR 

- Wait for MFCf rom memory. 

PC.^ K,, MDR . ^ WMFC 

Step 3 

- Place the contents of MDR onto the bus. 

- Load the IR with the contents of the bus. 

--MDRomMin 

40 



Execution of q complete instruction (contd..) 



Task 2. Fetch the operand (contents of memory pointed to by R3.) 
Task 3. Perform the addition. 
Task 4. Load the result into Rl. 



Step 4: - Place the contents of Register R3 or\-\o internal processor bus. 

- Load the contents of the bus onto MAR. 

- Activate the Read cor\\ro\ signal. 



- R3.. ^ MAR : ^ Read 



Step 5: - Place the contents of Rl onto the bus. 

-Load the contents of the bus into Register F^ecall one operand in Y). 

- Wait for MFC. 

- RlouaXir , MDR ^ ^ MFC 



Step 6: - Load the contents of MPT? onto the internal processor bus. 

-Select ICand perform the addition. 

- Place the result in Z. 



MDR ^,. ^ SeJectKAdd. Z.- „. 



Step 7: - Place the contents of Register Zonto the internal processor bus. 
- Place the contents of the bus into Register RL 



—^ouO-Mlin 
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Execution of q complete instruction (contd..) 



Step Action 



/ 

/ 


PC^ t MAR ^ Read Select4 Add 


2 


Znut, PCin, Yin, WMF C 


3 


MDRout. IRin 


4 


RSp.jt , MAR in. Read 


5 


R1nut, Yin, WMFC 


6 


MDRnnf ,SelectY, Add, Z,n 


7 


Znut, R1in,End 
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Branch instructions 



□ Recall that the updated contents of the PCare copied into 
Register Fin Step 2. 

♦ Not necessary for ADD instruction, but useful in BRANCH 
instructions.: 

♦ Branch target address is computed by adding the updated 
contents of the PC-to an offset. 

□ Copying the updated contents of the PC-to Register Y 
speeds up the execution of ^7?^AO/ instruction. 

□ Since the Fetch cycle is the san^e for all instructions, this 
step is performed for all instructions. 

♦ Since Register Y\s not used for any other purpose at that time 
it does not have any impact on the execution of the instruction. 



43 



Unconditional Branch instructions 



Step Action 

1 PCoui, MARtn, Read, Select4, Add, Zin 
"out I PCm, Y,„, WMFC 

3 MDRo„,, IR<„ 

4 Oflfeet-field-of-IRo^, Add, Zi„ 
f "out) PCin, End 

Figiira 7*7 Control sequence br on unconditional 
Branch instruction. 
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^ Conditional Branch instructions 



Consider now a conditional branch, In this case, we need to check the status of the 
condition code^ befoits loading a new value into the PC, For example, for a Bnmch-oD- 
negative (6rai]ch<0) iostniction, step 4 in Figure 7.7 is replaced with 

Offset-fie!d-of-IRcH„ Add, Zj,, If N = 0 then End 

ItHis, if N=0 the processor returns to step 1 immediately afier step 4. If N= 1, 
stqi S is performed to load a new value into the PC, thus perfortning the branch 
oper<ition. 
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